Domino: SAIC's English Entity-Linking System

نویسندگان

Alan Buabuchachart

Parakh Jain

Ryan Murphy

Scott White

Leora Morgenstern

چکیده

The Domino system was SAIC’s student-intern entry to the English Entity-Linking track of the 2012 TAC-KBP competition. This paper describes how Domino was developed using components from the CUNY-BLENDER system and discusses the features and rules that were added to Domino. It analyzes Domino’s performance, and suggests ways in which we plan to improve the system in the future. 1.Building the Domino Baseline System 1.1 Motivation and Constraints Entity linking is a central task that analysts in the intelligence community (IC) often perform. Analysts must try to determine, for example, whether a person who is referred to in an intercepted email is the same as a person who is reported in some news article to have engaged in some terrorist activity. SAIC, which supports IC analysts in many different ways, is interested in developing methods to help automate the entitylinking process. There are many similarities between the IC entity-linking task and the TAC-KBP EntityLinking track, which focuses on linking named entities in news articles or blog posts with Wikipedia articles. We --a group of students from the University of Maryland who spent the summer of 2012 at SAIC, and our supervisor at SAIC --therefore decided, in mid-June 2012, to enter the TAC-KBP Entity Linking competition. Because most of us --and in particular, the developers among us –were undergraduates with little experience in Natural Language Processing, and because we knew we had only two months to pull together a system, we decided that we would try to use existing resources as much as possible. Our aim was to get an existing entity-linking system and modify it in order to improve output results. We were especially interested in generalizing the entitylinking system so that it would be useful for more than just Wikipedia’s domain. SAIC’s customers will often be interested in people who keep a low profile and who would be unlikely to have an entry in Wikipedia. 1.2 Harnessing Existing Resources We were fortunate that the researchers who had developed CUNY-BLENDER, the CUNY’s entry to multiple TAC-KBP tracks (entity linking and slot filling) in 2010 [Chen et al., 2010] had made CUNY-BLENDER’s codebase available to anyone who wanted to use it. We decided to build our system on top of the CUNY-BLENDER pipeline. Originally, we had envisioned that DOMINO would be a superset of CUNY-BLENDER. We had hoped to quickly get CUNY-BLENDER running, establish a baseline, and then spend most of our time experimenting with new features which would enhance Domino’s performance. In fact, we needed to make significant adjustments and modifications to CUNY-BLENDER. As a result, while Domino is based on CUNY-BLENDER, it is neither a subset nor a superset of it. 1.3 Overview of CUNY-BLENDER The architecture for the CUNY-BLENDER pipeline is shown below. Figure 1: CUNY-BLENDER’S architecture The general procedure for entity linking is as follows:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WebSAIL Wikifier: English Entity Linking at TAC 2013

In this paper, we report on our participation in the English Entity Linking task at TAC 2013. We present the WebSAIL Wikifier system, an entity disambiguation system that links textual mentions to their referent entities in Wikipedia. The system uses a supervised machine learning approach and a string-matching clustering method, and scores 58.1% B+ F1 on the TAC 2013 test set.

متن کامل

The IBM Systems for English Entity Discovery and Linking and Spanish Entity Linking at TAC

This paper describes the IBM systems for English Entity Discovery and Linking (EDL) and Spanish Entity Linking (EL) for the TAC 2014 Knowledge-Base Population track. We submitted two runs for both the EDL and Spanish EL tracks. The linking component of the EDL system achieved an F1 score of 85.0 on the benchmark MSNBC dataset while the Spanish system achieved a B3 + F1 score of 73.6 on the TAC ...

متن کامل

LIA at TAC KBP 2012 English Entity Linking track

This paper describes our participation in the English Entity Linking task at KBP 2012.

متن کامل

CUNY BLENDER TAC-KBP2012 Entity Linking System and Slot Filling Validation System

This year the CUNY-BLENDER team participated in the English Entity Linking and Slot Filling Validation tracks. for entity linking, we apply two new techniques, collaborative clustering and query reformulation. For answer validation, we use a logistic regression model trained on within-system and crosssystem features to re-rank the merged answer sets generated by individual systems. In this pape...

متن کامل

Creating and Curating a Cross-Language Person-Entity Linking Collection

To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages. This paper describes an efficient way to create and curate such a collection, judiciously exploiting existing language resources. Queries are created by semi-automatically identifying person names on the English side of a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Domino: SAIC's English Entity-Linking System

نویسندگان

چکیده

منابع مشابه

WebSAIL Wikifier: English Entity Linking at TAC 2013

The IBM Systems for English Entity Discovery and Linking and Spanish Entity Linking at TAC

LIA at TAC KBP 2012 English Entity Linking track

CUNY BLENDER TAC-KBP2012 Entity Linking System and Slot Filling Validation System

Creating and Curating a Cross-Language Person-Entity Linking Collection

عنوان ژورنال:

اشتراک گذاری